H (p, q) = E p [- log q] = H (p) + D K L (p ∥ q),

$H(p,q)=\operatorname {E}_{p}[-\log q]=H(p)+D_{{{\mathrm {KL}}}}(p\|q),\!$
For discrete

p $p$ and

q $q$ this means:

H (p, q) = - \sum x p (x) log q (x) .

$H(p,q)=-\sum _{x}p(x)\,\log q(x).\!$
Logistic loss in the logistic regression is sometimes called cross-entropy loss, which measures the similarity between the prection and actual data labels:

L (w) = 1 N \sum n = 1 N H (p n, q n) = - 1 N \sum n = 1 N [y n log y ̂ n + (1 - y n) log (1 - y ̂ n)],

${\begin{aligned}L({\mathbf {w}})\ &=\ {\frac 1N}\sum _{{n=1}}^{N}H(p_{n},q_{n})\ =\ -{\frac 1N}\sum _{{n=1}}^{N}\ {\bigg [}y_{n}\log {\hat y}_{n}+(1-y_{n})\log(1-{\hat y}_{n}){\bigg ]}\,,\end{aligned}}$
Because the probability of the data label

yi $y_i$ is 0 or 1 and is fixed, so in the softmax regression, the cross-entropy loss is expressed as:

J (θ) = - ⎡ ⎣ ⎢ ⎢ \sum i = 1 m \sum k = 1 K 1 {y (i) = k} log exp ( θ ( k ) ⊤ x ( i ) ) \sum K j = 1 exp ( θ ( j ) ⊤ x ( i ) ) ⎤ ⎦ ⎥ ⎥

$\begin{align} J(\theta) = - \left[ \sum_{i=1}^{m} \sum_{k=1}^{K} 1\left\{y^{(i)} = k\right\} \log \frac{\exp(\theta^{(k)\top} x^{(i)})}{\sum_{j=1}^K \exp(\theta^{(j)\top} x^{(i)})}\right] \end{align}$

reference

https://en.wikipedia.org/wiki/Cross_entropy
http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/

Cross Entropy

reference